CSAR-2: a Case Study of Parallel File System Dependability
نویسندگان
چکیده
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage efficiency and level of fault tolerance. However the actual level of dependability of an enhanced striped file system is determined by more than just the redundancy scheme adopted, depending in general on other factors such as the type of fault detection mechanism, the nature and the speed of the recovery. In this paper we address the question of how to assess the dependability of CSAR, a version of PVFS augmented with a RAID5 distributed redundancy scheme we described in a previous work. First, we address the issues encountered in adding fault detection and recovery mechanisms to CSAR in order to produce CSAR-2. Second, we build a reliability model of the new system with parameters obtained from a CSAR-2 prototype and from the literature. Finally, we assess the system and discuss some interesting observations that can be made with the help of the model. According to our analysis, a representative configuration shows a four nine reliability; the sensitivity analysis shows that a reduction of 16% of the system outage time can be obtained by increasing the speed of the reconstruction by a faster network.
منابع مشابه
CSAR-2: A Case Study of Parallel File System Dependability Analysis
Modern cluster file systems such as PVFS that stripe files across multiple nodes have shown to provide high aggregate I/O bandwidth but are prone to data loss since the failure of a single disk or server affects the whole file system. To address this problem a number of distributed data redundancy schemes have been proposed that represent different trade-offs between performance, storage effici...
متن کاملCSAR: Cluster Storage with Adaptive Redundancy
Striped file systems such as the Parallel Virtual File System (PVFS) deliver high-bandwidth I/O to applications running on clusters. An open problem of existing striped file systems is how to provide efficient data redundancy to decrease their vulnerability to disk failures. In this paper we describe CSAR, a version of PVFS augmented with a novel redundancy scheme that addresses the efficiency ...
متن کاملChemical Reaction Effects on Bio-Convection Nanofluid flow between two Parallel Plates in Rotating System with Variable Viscosity: A Numerical Study
In the present work, a mathematical model is developed and analyzed to study the influence of nanoparticle concentration through Brownian motion and thermophoresis diffusion. The governing system of PDEs is transformed into a coupled non-linear ODEs by using suitable variables. The converted equations are then solved by using robust shooting method with the help of MATLAB (bvp4c). The impacts o...
متن کاملAn Experimental Evaluation of the Coda
Experimental evaluation is an important way to assess distributed systems, and fault injection is the dominant technique in this area for the evaluation of a system’s dependability. For distributed systems, network failure is an important fault model. Physical network failures often have far-reaching effects, giving rise to multiple correlated failures as seen by higher-level protocols. This th...
متن کاملA High Performance Redundancy Scheme for Cluster File Systems
A known problem in the design of striped file systems is their vulnerability to disk failures. In this paper we address the challenges of augmenting an existing file system with traditional RAID redundancy, and we propose a novel hybrid redundancy scheme designed to maximize disk throughput as seen by the applications. To demonstrate the hybrid redundancy scheme we build CSAR, a proof-of-concep...
متن کامل